xen.git
15 years agox86 xsave: supports xsave (CPUID:0xD) enumeration for all sub-leaves.
Keir Fraser [Fri, 24 Dec 2010 08:39:42 +0000 (08:39 +0000)]
x86 xsave: supports xsave (CPUID:0xD) enumeration for all sub-leaves.

In specific, it fixes the following issues:

1. The sub-leaves of CPUID:0x0000000D aren't contiguous. Hypervisor
shouldn't use register values to stop the enumeration. This patch
moves checking on XSAVE sub-leaves out of if-else statement. It also
bumps up sub-leaves to 63.
2. It creates a common function for xsave.
3. The main leaf 0 of CPUID:0x0000000D in current Xen is broken,
especially ECX and EBX registers. This patch cleans it up.
4. It adds support to detects EBX value of CPUID:0x0000000D main leaf
0 on-the-fly.

Signed-off-by: Wei Huang2 <wei.huang2@amd.com>
15 years agox86 xsave: Enable xsave_feature[62] (AMD Lightweight Profiling)
Keir Fraser [Fri, 24 Dec 2010 08:38:22 +0000 (08:38 +0000)]
x86 xsave: Enable xsave_feature[62] (AMD Lightweight Profiling)

The spec of LWP is available at
http://developer.amd.com/cpu/lwp/Pages/default.aspx.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
15 years agox86 xsave: Fix 64bit xsave_feature support for set_xcr0().
Keir Fraser [Fri, 24 Dec 2010 08:37:34 +0000 (08:37 +0000)]
x86 xsave: Fix 64bit xsave_feature support for set_xcr0().

Signed-off-by: Wei Huang <wei.huang2@amd.com>
15 years agocredit2: On debug keypress print load average as a fraction
Keir Fraser [Fri, 24 Dec 2010 08:32:43 +0000 (08:32 +0000)]
credit2: On debug keypress print load average as a fraction

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Different unbalance tolerance for underloaded and overloaded queues
Keir Fraser [Fri, 24 Dec 2010 08:32:20 +0000 (08:32 +0000)]
credit2: Different unbalance tolerance for underloaded and overloaded queues

Allow the "unbalance tolerance" -- the amount of difference between
two runqueues that will be allowed before rebalancing -- to differ
depending on how busy the runqueue is.  If it's less than 100%,
default to a difference of 1.0; if it's more than 100%, default to a
tolerance of 0.125.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Introduce a loadavg-based load balancer
Keir Fraser [Fri, 24 Dec 2010 08:31:54 +0000 (08:31 +0000)]
credit2: Introduce a loadavg-based load balancer

This is a first-cut at getting load balancing.  I'm first working on
looking at behavior I want to get correct; then, once I know what kind
of behavior works well, then I'll work on getting it efficient.

The general idea is when balancing runqueues, look for the runqueue
whose loadavg is the most different from ours (higher or lower).
Then, look for a transaction which will bring the loads closest
together: either pushing a vcpu, pulling a vcpu, or swapping them.
Use the per-vcpu load to calculate the expected load after the
exchange.

The current algorithm looks at every combination, which is O(N^2).
That's not going to be suitable for workloads with large numbers of
vcpus (such as highly consolidated VDI deployments).  I'll make a more
efficient algorithm once I've experimented and determined what I think
is the best load-balancing behavior.

At the moment, balance from a runqueue every time the credit resets.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Use loadavg to pick cpus, instead of instantaneous load
Keir Fraser [Fri, 24 Dec 2010 08:31:24 +0000 (08:31 +0000)]
credit2: Use loadavg to pick cpus, instead of instantaneous load

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Migrate request infrastructure
Keir Fraser [Fri, 24 Dec 2010 08:31:04 +0000 (08:31 +0000)]
credit2: Migrate request infrastructure

Put in infrastructure to allow a vcpu to requeset to migrate to a
specific runqueue.  This will allow a load balancer to choose running
VMs to migrate, and know they will go where expected when the VM is
descheduled.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Track expected load
Keir Fraser [Fri, 24 Dec 2010 08:30:42 +0000 (08:30 +0000)]
credit2: Track expected load

As vcpus are migrated, track how we expect the load to change.  This
helps smooth migrations when the balancing doesn't take immediate
effect on the load average.  In theory, if vcpu activity remains
constant, then the measured avgload should converge to the balanced
avgload.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Track average load contributed by a vcpu
Keir Fraser [Fri, 24 Dec 2010 08:30:15 +0000 (08:30 +0000)]
credit2: Track average load contributed by a vcpu

Track the amount of load contributed by a particular vcpu, to help
us make informed decisions about what will happen if we make a move.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Calculate load average
Keir Fraser [Fri, 24 Dec 2010 08:29:53 +0000 (08:29 +0000)]
credit2: Calculate load average

Calculate a per-runqueue decaying load average.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Detect socket layout and assign one runqueue per socket
Keir Fraser [Fri, 24 Dec 2010 08:29:27 +0000 (08:29 +0000)]
credit2: Detect socket layout and assign one runqueue per socket

Because alloc_pdata() is called before the cpu layout information is
available, we grab a callback to the newly-created CPU_STARTING
notifier.

cpu 0 doesn't get a callback, so we simply hard-code it to runqueue 0.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Simple cpu picker based on instantaneous load
Keir Fraser [Fri, 24 Dec 2010 08:29:00 +0000 (08:29 +0000)]
credit2: Simple cpu picker based on instantaneous load

In preparation for multiple runqueues, add a simple cpu picker that
will look for
the runqueue with the lowest instantaneous load to assign the vcpu to.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Calculate instantaneous runqueue load
Keir Fraser [Fri, 24 Dec 2010 08:28:35 +0000 (08:28 +0000)]
credit2: Calculate instantaneous runqueue load

Add hooks in the various places to detect vcpus becoming active or
inactive.  At the moment, record only instantaneous runqueue load;
but this lays the groundwork for having a load average.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Handle runqueue changes
Keir Fraser [Fri, 24 Dec 2010 08:28:10 +0000 (08:28 +0000)]
credit2: Handle runqueue changes

In preparation for cross-runqueue migration, make changes to make that
more robust.
Changes include:
* An up-pointer from the svc struct to the runqueue it's assigned to
* Explicit runqueue assign/desassings, with appropriate ASSERTs
* cpu_pick will de-assign a vcpu from a runqueue if it's migrating,
and wake will re-assign it

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Refactor runqueue initialization
Keir Fraser [Fri, 24 Dec 2010 08:27:42 +0000 (08:27 +0000)]
credit2: Refactor runqueue initialization

Several refactorizations:
* Add prv->initialized cpu mask
* Replace prv->runq_count with active_queue mask
* Replace rqd->cpu_min,cpu_mask with  active cpu mask
* Put locks in the runqueue structure, rather than borrowing the
existing cpu locks
* init() initializes all runqueues to NULL, inactive, and maps all
pcpus to runqueue -q
* alloc_pcpu() will add cpus to runqueues, "activating" the runqueue
if necessary.  All cpus are currently assigned to runqueue 0.

End-to-end behavior of the system should remain largely the same.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agoscheduler: Introduce pcpu_schedule_lock
Keir Fraser [Fri, 24 Dec 2010 08:26:59 +0000 (08:26 +0000)]
scheduler: Introduce pcpu_schedule_lock

Many places in Xen, particularly schedule.c, grab the per-cpu spinlock
directly, rather than through vcpu_schedule_lock().  Since the lock
pointer may change between the time it's read and the time the lock is
successfully acquired, we need to check after acquiring the lock to
make sure that the pcpu's lock hasn't changed, due to cpu
initialization or cpupool activity.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agoscheduler: Update vcpu_schedule_lock to check for changed lock pointer as well
Keir Fraser [Fri, 24 Dec 2010 08:26:29 +0000 (08:26 +0000)]
scheduler: Update vcpu_schedule_lock to check for changed lock pointer as well

Credit2 has different cpus share a lock; which means that as cpus are
added, and as they're moved between pools, the pointer to the
scheduler lock may also change as well.

Since we don't want to have to grab a lock before grabbing the per-cpu
scheduler lock, we use the lock itself to protect against the pointer
changing.

However, since it may change between reading and locking, after we
grab the lock we need to check to make sure it's still the right one.

Update the vcpu_schedule_lock() definition to reflect this: both
v->processor and that processor's schedule lock are liable to change;
check both after grabbing the lock, and release / re-acquire if
necessary.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agocredit2: Quieten some debug messages
Keir Fraser [Fri, 24 Dec 2010 08:25:54 +0000 (08:25 +0000)]
credit2: Quieten some debug messages

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
15 years agoSupport new xl command cpupool-numa-split
Juergen Gross [Thu, 9 Dec 2010 10:50:53 +0000 (11:50 +0100)]
Support new xl command cpupool-numa-split

New xl command cpupool-numa-split which will create one cpupool for each
numa node of the machine. Can be called only if no other cpupools than Pool 0
are defined. After creation the cpupools can be managed as usual.

Signed-off-by: juergen.gross@ts.fujitsu.com
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agoSupport renaming of cpupools
Juergen Gross [Thu, 9 Dec 2010 12:37:32 +0000 (13:37 +0100)]
Support renaming of cpupools

Add a new library function libxl_cpupool_rename() and a new xl command
xl cpupool-rename to support renaming of cpupools.

Signed-off-by: juergen.gross@ts.fujitsu.com
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agoExtend cpupools to support numa
Juergen Gross [Thu, 9 Dec 2010 10:26:37 +0000 (11:26 +0100)]
Extend cpupools to support numa

The user interfaces for cpupools are extended to support numa machines:
- xl cpupool-create supports now specifying a node list instead of a cpu list.
  The new cpupool will be created with all free cpus of the specified numa
  nodes.
- xl cpupool-cpu-remove and xl cpupool-cpu-add can take a node number instead
  of a cpu number. Using 'node:1' for the cpu parameter will, depending on
  the operation, either remove all cpus of node 1 in the specified cpupool,
  or add all free cpus of node 1 to the cpupool.

libxl is extended with the following functions to support this feature:
int libxl_cpupool_cpuadd_node(libxl_ctx *ctx, uint32_t poolid, int node, int *cpus)
int libxl_cpupool_cpuremove_node(libxl_ctx *ctx, uint32_t poolid, int node, int *cpus)

Signed-off-by: juergen.gross@ts.fujitsu.com
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agosupport topolgy info in xl info
Juergen Gross [Thu, 9 Dec 2010 10:23:20 +0000 (11:23 +0100)]
support topolgy info in xl info

Adds option -n/--numa to xl info command to print topology information.
No numa information up to now, as I've no machine which will give this info
via xm info (could be a bug in xm, however).

Signed-off-by: juergen.gross@ts.fujitsu.com
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agoSupport getting topology info in libxl
Juergen Gross [Thu, 9 Dec 2010 10:21:30 +0000 (11:21 +0100)]
Support getting topology info in libxl

Added new function libxl_get_topologyinfo() to obtain this information from
hypervisor.

Signed-off-by: juergen.gross@ts.fujitsu.com
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: refactor Linux OS interface into a separate file.
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: refactor Linux OS interface into a separate file.

This helps ensure that the osdep abstraction is complete by
allowing us to avoid including xc_private.h.

All the other OS backends could benefit from the same treatment but
since I cannot compile test I did not do this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: move foreign memory functions to xc_foreign_memory.c
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: move foreign memory functions to xc_foreign_memory.c

Now that this file exists it is a better home for these than xc_misc.c

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: add abitility to dynamically load osdep.
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: add abitility to dynamically load osdep.

Add a dummy backend which always returns ENOSYS. Mainly as a compile
time testbed rather than because it is a useful backend.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: Use XC_PAGE_{SHIFT,MASK}.
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: Use XC_PAGE_{SHIFT,MASK}.

Avoid dependency on xc_private.h

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: allow osdep backends to log via the xc infrastructure.
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: allow osdep backends to log via the xc infrastructure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: drop fd from xc_interface
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: drop fd from xc_interface

Transition to xc_osdep_handle is now complete and nothing uses
(or should be using) it.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: add ability to query OS interface for "fakeness"
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: add ability to query OS interface for "fakeness"

i.e. not running on a real hypervisor

Allows users of the library to adjust behaviour. I don't especially
like this violation of the abstraction but both oxenstored and xapi
use this to avoid difficult to simulate operations when running on the
simulator.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_gnttab_set_max_grants()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_gnttab_set_max_grants()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_gnttab_munmap()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_gnttab_munmap()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_gnttab_map_{grant_ref,grant_refs,domain_grant_refs}()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_gnttab_map_{grant_ref,grant_refs,domain_grant_refs}()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_evtchn_{pending,unmask}()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_evtchn_{pending,unmask}()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_evtchn_unbind()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_evtchn_unbind()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_evtchn_bind_virq()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_evtchn_bind_virq()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_evtchn_bind_interdomain()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_evtchn_bind_interdomain()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_evtchn_bind_unbound_port()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_evtchn_bind_unbound_port()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_evtchn_notify()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_evtchn_notify()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_evtchn_fd()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_evtchn_fd()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_map_foreign_ranges()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_map_foreign_ranges()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_map_foreign_range()
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_map_foreign_range()

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert xc_map_foreign_{batch,bulk}
Ian Campbell [Fri, 3 Dec 2010 09:36:47 +0000 (09:36 +0000)]
libxc: osdep: convert xc_map_foreign_{batch,bulk}

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: convert do_xen_hypercall()
Ian Campbell [Fri, 3 Dec 2010 09:36:46 +0000 (09:36 +0000)]
libxc: osdep: convert do_xen_hypercall()

do_privcmd() was only ever used by do_xen_hypercall() so remove it.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: osdep: add framework for abstracting access to dom0 OS hypervisor interfaces.
Ian Campbell [Fri, 3 Dec 2010 09:36:46 +0000 (09:36 +0000)]
libxc: osdep: add framework for abstracting access to dom0 OS hypervisor interfaces.

This patch introduces the basic infrastructure and uses it for open
and close operations on privcmd, evtchn and gnttab devices.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson.citrix.com>
15 years agolibxc: convert gnttab interfaces to use an opaque handle type
Ian Campbell [Thu, 23 Dec 2010 15:32:52 +0000 (15:32 +0000)]
libxc: convert gnttab interfaces to use an opaque handle type

The xc_interface previously passed to xc_gnttab_* was only used for
logging which can now be done via the xc_gnttab handle instead.

This makes the interface consistent with the changes made to the main
interface in 21483:779c0ef9682c.

Also update QEMU_TAG to pull in the corresponding qemu change.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agolibxc: convert evtchn interfaces to use an opaque handle type
Ian Campbell [Thu, 23 Dec 2010 15:25:57 +0000 (15:25 +0000)]
libxc: convert evtchn interfaces to use an opaque handle type

This makes the interface consistent with the changes made to the main
interface in 21483:779c0ef9682c.

Also fix some references to "struct xc_interface" which should have
been simply "xc_interface" in tools/xenpaging, and update QEMU_TAG to
pull in the corresponding qemu change.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agolibxc: some xc_gnttab_* functions are not Linux specific
Ian Campbell [Thu, 23 Dec 2010 15:08:21 +0000 (15:08 +0000)]
libxc: some xc_gnttab_* functions are not Linux specific

They simply make hypercalls and perform other operations via the
abstract interface. Create xc_gnttab.c and move those functions there.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agotools/hotplug: read /etc/default/xencommons if appropriate
Ian Jackson [Wed, 22 Dec 2010 17:48:31 +0000 (17:48 +0000)]
tools/hotplug: read /etc/default/xencommons if appropriate

Since 22187:c41252a55a0a we have been installing our example
xencommons settings file in either /etc/sysconfig or /etc/default,
depending on whether /etc/sysconfig exists.

However I omitted to add the code to /etc/init.d/xencommons to
actually read either version of the file, although every other init
script seems to have it.

An effect of this misplaced/unread file is that the automatic tests
don't cause xenconsoled to collect serial logs, because the tester
edits whichever file actually exists.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agoEPT/VT-d: disable page sharing by default.
Keir Fraser [Tue, 21 Dec 2010 18:10:46 +0000 (18:10 +0000)]
EPT/VT-d: disable page sharing by default.

Currently sharing these page tables causes a hang on boot on some
hardware. Disable by default until this is resolved.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
15 years agox86 hvm ept: Remove EPT guest linear address validation
Keir Fraser [Tue, 21 Dec 2010 18:09:34 +0000 (18:09 +0000)]
x86 hvm ept: Remove EPT guest linear address validation

For EPT violation resulting from an attempt to load the guest PDPTEs
as part of the execution of the MOV CR instruction, the EPT_GLA_VALID
is not valid.  This situation should not happen in most situation,
since we always populate guest memory. But this is not ture for PAE
guest under the PoD/Page sharing situation. In that situation, a page
pointed by CR3 may be un-populated, and we need handle it in such
situation.

Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
15 years agotools/hotplug/Linux: Avoid dependency on iptables conntrack module.
Keir Fraser [Fri, 17 Dec 2010 16:12:37 +0000 (16:12 +0000)]
tools/hotplug/Linux: Avoid dependency on iptables conntrack module.

Checking for RELATED,ESTABLISHED traffic being sent to a domU requires
connection tracking, which adds unexpected (to most users) load to
dom0. Heavily loaded systems can fill the conntrack tables.

So avoid this, be more liberal in what we accept, and leave it to domU
to police its own input.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86, atomic: Fix 32-bit version of atomic_write64().
Keir Fraser [Fri, 17 Dec 2010 14:16:41 +0000 (14:16 +0000)]
x86, atomic: Fix 32-bit version of atomic_write64().

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86 ept: Define and use atomic_{read,write}_ept_entry().
Keir Fraser [Fri, 17 Dec 2010 11:51:10 +0000 (11:51 +0000)]
x86 ept: Define and use atomic_{read,write}_ept_entry().

Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
15 years agovtd: Reinstate ACPI DMAR on system shutdown or S3/S4/S5.
Keir Fraser [Fri, 17 Dec 2010 10:46:43 +0000 (10:46 +0000)]
vtd: Reinstate ACPI DMAR on system shutdown or S3/S4/S5.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86 hvm: Move CPUID.0xd (XSAVE) configuration into libxc.
Keir Fraser [Fri, 17 Dec 2010 09:54:22 +0000 (09:54 +0000)]
x86 hvm: Move CPUID.0xd (XSAVE) configuration into libxc.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86:xsaveopt: Enable xsaveopt feature in Xen and guest
Keir Fraser [Fri, 17 Dec 2010 09:25:00 +0000 (09:25 +0000)]
x86:xsaveopt: Enable xsaveopt feature in Xen and guest

This patch uses "xsaveopt" instead of "xsave" if the feature is
supported in hardware to optimize task switch performance in Xen. It
also exposes the feature to guest VMs.

Signed-off-by: Zhang Fengzhe <fengzhe.zhang@intel.com>
15 years agoxentrace: Clean up initialisation.
Keir Fraser [Thu, 16 Dec 2010 20:07:03 +0000 (20:07 +0000)]
xentrace: Clean up initialisation.

Allocate no memory and print no debug messages when disabled.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agovtd: Clean up a recent printk message.
Keir Fraser [Thu, 16 Dec 2010 20:06:36 +0000 (20:06 +0000)]
vtd: Clean up a recent printk message.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86: Define pte_{read,write}[_atomic] in terms of atomic_readN
Keir Fraser [Thu, 16 Dec 2010 19:36:35 +0000 (19:36 +0000)]
x86: Define pte_{read,write}[_atomic] in terms of atomic_readN

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86: Define atomic_{read,write}{8,16,32,64} accessor functions.
Keir Fraser [Thu, 16 Dec 2010 19:29:08 +0000 (19:29 +0000)]
x86: Define atomic_{read,write}{8,16,32,64} accessor functions.

These absolutely guarantee to read/write a uint*_t with a single atomic
processor instruction.

Also re-define atomic_read/atomic_write (act on atomic_t) similarly.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86/bitops.h: Remove unused smp_mb__* macros
Keir Fraser [Thu, 16 Dec 2010 19:04:11 +0000 (19:04 +0000)]
x86/bitops.h: Remove unused smp_mb__* macros

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86/atomic.h: Clean up for Xen code style; remove unused smp_mb__*
Keir Fraser [Thu, 16 Dec 2010 19:01:35 +0000 (19:01 +0000)]
x86/atomic.h: Clean up for Xen code style; remove unused smp_mb__*

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86: Remove unnecessary LOCK/LOCK_PREFIX macros.
Keir Fraser [Thu, 16 Dec 2010 18:46:55 +0000 (18:46 +0000)]
x86: Remove unnecessary LOCK/LOCK_PREFIX macros.

We don't support !CONFIG_SMP.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86: move early page fault code into .init.text
Keir Fraser [Thu, 16 Dec 2010 18:37:30 +0000 (18:37 +0000)]
x86: move early page fault code into .init.text

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86/asm: allow some unlikely taken branches to be statically predicted this way
Keir Fraser [Thu, 16 Dec 2010 18:37:20 +0000 (18:37 +0000)]
x86/asm: allow some unlikely taken branches to be statically predicted this way

... by moving the respective code out of line (into sub-section 1 of
the particular section). A few other branches could be eliminated
altogether.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agotools/xcutils: xc_save: add missing whitespace
Olaf Hering [Thu, 16 Dec 2010 18:25:33 +0000 (18:25 +0000)]
tools/xcutils: xc_save: add missing whitespace

Add missing whitespace between the two error strings.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
15 years agotools/libxc: fix comment typo in xc_domain_save
Olaf Hering [Thu, 16 Dec 2010 18:24:57 +0000 (18:24 +0000)]
tools/libxc: fix comment typo in xc_domain_save

evey -> every

Signed-off-by: Olaf Hering <olaf@aepfle.de>
15 years agolibxl: constify libxl_create_cpupool()
Christoph Egger [Thu, 16 Dec 2010 18:24:04 +0000 (18:24 +0000)]
libxl: constify libxl_create_cpupool()

Attached patch constifies libxl_create_cpupool().

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agotools, bsd: complete implementation of discard_file_cache
Christoph Egger [Thu, 16 Dec 2010 18:21:56 +0000 (18:21 +0000)]
tools, bsd: complete implementation of discard_file_cache

attached patch completes discard_file_cache() for NetBSD.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agoxenstore: set implicit path for socket connections
Ian Campbell [Thu, 16 Dec 2010 18:04:08 +0000 (18:04 +0000)]
xenstore: set implicit path for socket connections

For now assume all such connections come from domain 0.

Failure to do this breaks various scripts which assume that they
operate relative to the domains "home directory".

This matches the behaviour of the ocaml xenstored.

Thanks to report from Olaf Hering.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agotools/hotplug: Do not recursively invoke xenstore_write on error
Ian Campbell [Thu, 16 Dec 2010 17:58:00 +0000 (17:58 +0000)]
tools/hotplug: Do not recursively invoke xenstore_write on error

This fixes a possible infinite recursion.

From: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agotools/xl: fix race which can leave an xl monitor processing hanging
Ian Jackson [Thu, 16 Dec 2010 17:39:24 +0000 (17:39 +0000)]
tools/xl: fix race which can leave an xl monitor processing hanging

If the domain is destroyed (eg with xl destroy), it is possible that
the xl which is monitoring the domain for restart/preserve will not be
able to get the domain shutdown reason.

Before this patch, it would then ignore the domain death event and
carry on waiting, forever, for more events.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agoQEMU_TAG update
Ian Jackson [Thu, 16 Dec 2010 16:58:39 +0000 (16:58 +0000)]
QEMU_TAG update

15 years agoaudit_p2m: fix syntax errors in disabled debug code
Keir Fraser [Thu, 16 Dec 2010 15:41:52 +0000 (15:41 +0000)]
audit_p2m: fix syntax errors in disabled debug code

If P2M_PRINTK is re-defined as ordinary printk,
some (disabled) debug statements will not compile.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
15 years agovtd: Require unmap_vtd_domain_page() on a couple of early exit paths.
Keir Fraser [Thu, 16 Dec 2010 15:38:57 +0000 (15:38 +0000)]
vtd: Require unmap_vtd_domain_page() on a couple of early exit paths.

From: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xen.org>
15 years agoMerge
Ian Jackson [Wed, 15 Dec 2010 16:49:25 +0000 (16:49 +0000)]
Merge

15 years agotools/hotplug/Linux: force release lock if holder process is gone (fix)
Kouya Shimura [Wed, 15 Dec 2010 16:49:06 +0000 (16:49 +0000)]
tools/hotplug/Linux: force release lock if holder process is gone (fix)

22508:57907b28e51a was unsafe for mutual exclusion.  There is a case
that the owner file doesn't exist yet when an atomic mkdir operation
fails.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agosvm: support VMCB cleanbits
Keir Fraser [Wed, 15 Dec 2010 16:34:11 +0000 (16:34 +0000)]
svm: support VMCB cleanbits

Attached patch implements the VMCB cleanbits SVM feature.
Upcoming AMD CPUs introduce them and they are basically hints
for the CPU which vmcb values can be re-used from the previous
VMRUN instruction.

Each bit represents a certain set of fields in the VMCB.
Setting a bit tells the cpu it can re-use the cached value
from the previous VMRUN.
Clearing a bit tells the cpu to reload the values from the given VMCB.

Signed-off-by: Wei Huang <Wei.Huang2@amd.com>
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
15 years agoEPT/VT-d page table sharing
Keir Fraser [Wed, 15 Dec 2010 14:16:03 +0000 (14:16 +0000)]
EPT/VT-d page table sharing

Basic idea is to leverage 2MB and 1GB page size support in EPT by having
VT-d using the same page tables as EPT.  When EPT page table changes, flush
VT-d IOTLB cache.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
15 years agoblktap2: fix up non-ASCII characters in 21129:bf74d9c31674
Ian Jackson [Wed, 15 Dec 2010 13:34:26 +0000 (13:34 +0000)]
blktap2: fix up non-ASCII characters in 21129:bf74d9c31674

21129:bf74d9c31674 contained UTF-8-encoded nonbreaking spaces.
Sorry for not noticing this.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
15 years agox86/mm: make paging_map_log_dirty_bitmap() static
Keir Fraser [Wed, 15 Dec 2010 12:12:30 +0000 (12:12 +0000)]
x86/mm: make paging_map_log_dirty_bitmap() static
now that its only caller outside paging.c has been removed.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
15 years agox86/mm: fix up paging_mfn_is_dirty()
Keir Fraser [Wed, 15 Dec 2010 12:12:15 +0000 (12:12 +0000)]
x86/mm: fix up paging_mfn_is_dirty()

Add locking, and don't allocate the top-level page if it's not there.
Also gets rid of the default-to-1 case if there have been failed
allocations because the safer thing is actually to return 0 and avoid
modifying an un-dirtied page.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
15 years agox86/mm: move mfn_is_dirty along with the rest of the log-dirty code
Keir Fraser [Wed, 15 Dec 2010 12:11:57 +0000 (12:11 +0000)]
x86/mm: move mfn_is_dirty along with the rest of the log-dirty code

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
15 years agox86/32on64: zero-extend hypercall index before use in memory access (debug mode only)
Keir Fraser [Wed, 15 Dec 2010 12:10:31 +0000 (12:10 +0000)]
x86/32on64: zero-extend hypercall index before use in memory access (debug mode only)

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86-64: fix restoring of hypercall arguments after trace callout
Keir Fraser [Wed, 15 Dec 2010 12:09:41 +0000 (12:09 +0000)]
x86-64: fix restoring of hypercall arguments after trace callout

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agoReduce side effects of handling '*' debug key
Keir Fraser [Wed, 15 Dec 2010 12:04:34 +0000 (12:04 +0000)]
Reduce side effects of handling '*' debug key

NMI watchdog should be suppressed when dumping IRQ handlers. Softirqs
should be handled periodically while processing non-IRQ handlers.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: adjust other interrupt related section placement
Keir Fraser [Wed, 15 Dec 2010 11:59:00 +0000 (11:59 +0000)]
x86: adjust other interrupt related section placement

... and remove some variables the value of which is never used
altogether.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: adjust x2apic section placement
Keir Fraser [Wed, 15 Dec 2010 11:57:54 +0000 (11:57 +0000)]
x86: adjust x2apic section placement

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: x2apic pre-enabled but intr-remapping is not enabled
Keir Fraser [Wed, 15 Dec 2010 11:56:25 +0000 (11:56 +0000)]
x86: x2apic pre-enabled but intr-remapping is not enabled

Make it aligned with Linux kernel.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Removed unnecessary bits from the original patch, and removed
intremap_enabled() with its only caller gone.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: increase MAX_LOCAL_APIC
Keir Fraser [Wed, 15 Dec 2010 11:55:48 +0000 (11:55 +0000)]
x86: increase MAX_LOCAL_APIC

otherwise apicid_to_node[MAX_LOCAL_APIC] will be overrun if apicid >
255. After patch, the mapping get right.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Make this and also MAX_MADT_ENTRIES loosely depend on NR_CPUS. Tie
MAX_APICS to MAX_LOCAL_APIC. Fix initializer of x86_acpiid_to_apicid[]
to match the array member type of u32, as well as all checks in
readers of this array and x86_cpu_to_apicid[].

While the adjustment to xen_vcpu_physid_to_x86_{acpi,apic}id() is not
backward compatible, I think it should still be done this way as the
former reserving of values beyond 0xff should never have been part of
the interface. If considered impossible, a second best solution would
appear to be to make the macros depend on __XEN_INTERFACE_VERSION__.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: mpparse and cstate need to use 32bit apic id
Keir Fraser [Wed, 15 Dec 2010 11:52:14 +0000 (11:52 +0000)]
x86: mpparse and cstate need to use 32bit apic id

Instead of going with mpc_config_processor struct.
that field ony have 8 bits.

We should not change that struct, because it is shared with mptable.

Also need to increase MAX_APICS.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Rather than using a fixed value of 512, make this scale with NR_CPUS
(which obviously still doesn't cover all theoretically possible
systems, but at least allows some build time control).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
15 years agox86: Fix multicall handling for 6-arg hypercalls.
Keir Fraser [Wed, 15 Dec 2010 11:21:28 +0000 (11:21 +0000)]
x86: Fix multicall handling for 6-arg hypercalls.

None exist at the moment, but this makes multicall handling consistent
with direct PV and HVM hypercall handling.

Signed-off-by: Keir Fraser <keir@xen.org>
15 years agox86 hvm: Enable 6-argument hypercalls
Keir Fraser [Wed, 15 Dec 2010 11:17:41 +0000 (11:17 +0000)]
x86 hvm: Enable 6-argument hypercalls

Enable 6 argument hypercalls for HVMs. The hypercall code handles a
sixth argument in EBP or R9 but the HVM code is not passing the value.

Signed-off-by: Ross Philipson <ross.philipson@citrix.com>
15 years agox86 hvm: Expose TSC_DEADLINE CPU feature to guests via CPUID.
Keir Fraser [Wed, 15 Dec 2010 11:09:02 +0000 (11:09 +0000)]
x86 hvm: Expose TSC_DEADLINE CPU feature to guests via CPUID.

Signed-off-by: Wei Gang <gang.wei@intel.com>
15 years agox86 hvm: Emulate MSR_IA32_TSC_DEADLINE
Keir Fraser [Wed, 15 Dec 2010 11:01:59 +0000 (11:01 +0000)]
x86 hvm: Emulate MSR_IA32_TSC_DEADLINE

Accesses to MSR_IA32_TSC_DEADLINE are trapped, with value stored in a
new field vlapic->hw.tdt_msr. vlapic->pt is reused in one shot mode
for vtdt to trigger expire events.

For details, please refer to the Intel Architectures Software
Developer's Manual 3A, 10.5.4.1 TSC-Deadline Mode.

Signed-off-by: Wei Gang <gang.wei@intel.com>
15 years agox86: Define APIC_TIMER_MODE_xxx in apicdef.h
Keir Fraser [Wed, 15 Dec 2010 10:55:34 +0000 (10:55 +0000)]
x86: Define APIC_TIMER_MODE_xxx in apicdef.h

Signed-off-by: Wei Gang <gang.wei@intel.com>
15 years agox86: Define a new function gtsc_to_gtime()
Keir Fraser [Wed, 15 Dec 2010 10:49:57 +0000 (10:49 +0000)]
x86: Define a new function gtsc_to_gtime()

Define it to do the transform from guest tsc to guest time.

Fix the typo in gtime_to_gtsc() definition.

Signed-off-by: Wei Gang <gang.wei@intel.com>
15 years agoept: Remove lock in ept_get_entry, replace with access-once semantics.
Keir Fraser [Wed, 15 Dec 2010 10:47:05 +0000 (10:47 +0000)]
ept: Remove lock in ept_get_entry, replace with access-once semantics.

This mirrors the RVI/shadow situation, where p2m read access is
lockless because it's done in the hardware (linear map of the p2m
table).

This fixes the original bug (call it bug A) without introducing bug B
(a deadlock).

Bug A was caused by a race when updating p2m entries: between testing
if it's valid, and testing if it's populate-on-demand, it may have
been changed from populate-on-demand to valid.

My original patch simply introduced a lock into ept_get_entry, but
that caused bug B, caused by circular locking order: p2m_change_type
[grabs p2m lock] -> set_p2m_entry -> ept_set_entry ->
ept_set_middle_level -> p2m_alloc [grabs hap lock] write cr4 ->
hap_update_paging_modes [grabes hap lock] -> hap_update_cr3 ->
gfn_to_mfn -> ept_get_entry -> [grabs p2m lock]

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>